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Using a support vector machine and a land 
. surface model to estimate large-scale passive 
, microwave temperatures over snow-covered 

land in North America 

5 Barton A. Forman and Rolf H. Reichle 

e Abstract 

7 A support vector machine (SVM), a machine learning technique developed from statistical learning theory, is 

s employed for the purpose of estimating passive microwave (PMW) brightness temperatures over snow-covered land 

9 in North America as observed by the Advanced Microwave Scanning Radiometer (AMSR-E) satellite sensor. The 

10 capability of the trained SVM is compared relative to the artificial neural network (ANN) estimates originally presented 

11 in [14]. The results suggest the SVM outperforms the ANN at 10.65 GHz, 18.7 GHz, and 36.5 GHz for both vertically- 

12 and horizontally -polarized PMW radiation. When compared against daily AMSR-E measurements not used during 

13 the training procedure and subsequently averaged across the North American domain over the 9-year study period, the 

14 root mean squared error in the SVM output is 8 K or less while the anomaly correlation coefficient is 0.7 or greater. 

15 When compared relative to the results from the ANN at any of the six frequency and polarization combinations tested, 

16 the root mean squared error was reduced by more than 18% while the anomaly correlation coefficient was increased 

17 by more than 52%. Further, the temporal and spatial variability in the modeled brightness temperatures via the SVM 

is more closely agrees with that found in the original AMSR-E measurements. These findings suggest the SVM is a 

19 superior alternative to the ANN for eventual use as a measurement operator within a data assimilation framework. 

20 Index Terms 

21 AMSR-E, brightness temperature, modeling, support vector machines, remote sensing, passive microwave, snow 
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22 I. Introduction and Background 

23 Snow is a critical component of the hydrologic cycle because of its influence on land surface albedo [19], its 

24 control on land surface water and energy balances [31], and its impact on weather and climate [4], [17]. Snow 

25 also serves as the dominant source of freshwater supply for more than one billion people globally [3], [16]. 

26 Direct quantification of the mass of snow, or snow water equivalent (SWE), however, is complicated by significant 

27 spatial and temporal variability such that sparse, ground-based observation networks can not always capture the 

28 spatiotemporal heterogeneity of SWE. In response, researchers have begun using space-based instrumentation in 

29 conjunction with land surface models (LSMs) in an effort to better quantify this vital resource. 

30 Data assimilation can be used to merge satellite-derived measurements with physically-based LSMs [9], [10], 

31 [13], [29] by weighing the uncertainties in each in order to yield a merged estimate superior to the measurements or 

32 the model alone [25]. In this process, it is necessary to map the relevant model state variables into the corresponding 

33 measurement space: In the context of snow data assimilation, this can involve mapping model state variables into 

34 passive microwave (PMW) brightness temperature (T b ) space [2], [10], [12] using a physically-based radiative 

35 transfer model (RTM) [27], [35], [36]. However, LSMs operating at regional and continental scales do not possess 

36 the fidelity to provide the necessary inputs required by the RTM [11], and as such, previous PMW T b studies have 

37 been limited to point-scale or basin-scale applications [2], [10], [12]. 

38 Recent research has explored the use of machine learning as an efficient alternative to a RTM in order to map 

39 model state variables into PMW 7* space. It was shown that an artificial neural network (ANN) could effectively 

40 diagnose PMW T b at multiple frequencies and multiple polarizations across regional and continental scales [14]. 

41 Further, these results were unbiased over the 9-year study period, demonstrated significant skill during both the 

42 accumulation (i.e., when the snow is relatively dry) and ablation (i.e., when the snow is relatively wet) phases of the 

43 snow season, and yielded a domain-averaged root mean squared error (RMSE) less than 10 K at all frequency and 

44 polarization combinations investigated in the study. The findings of [14] were the first to demonstrate the potential 

45 of using an ANN as a measurement operator to estimate PMW T b over snow-covered land with the eventual goal 

46 of applying it in a large-scale SWE data assimilation framework. 

47 This current study expands on the work of [14] by investigating an alternative form of machine learning. Namely, 

48 the objective of this study is to explore the utilization of a support vector machine (SVM) for nonlinear regression 

49 as applied to PMW T h estimation over snow-covered land, and to contrast the results against those generated by the 
so ANN presented in [14]. SVMs are similar to ANNs in that both forms of machine learning are skilled at reproducing 

51 nonlinear processes [8], [26], [39]. However, there are also differences in performance between SVMs and ANNs. 

52 For example, if the problem is strictly convex, then the solution to the SVM optimization problem is unique. With 

53 convex constrained optimization problems, it has also been shown that SVMs are not plagued with the problem of 

54 local minima as are ANNs [32], Further, a number of resampling procedures are available [8] that easily allow for 

55 the proper selection of SVM parameters without the need for an “expert” user to decide a priori what the SVM 
se parameters should be, which is contrary to the general ANN application case. 
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57 The SVM methodology and experimental domain used in this study are outlined in section II and appendix, the 

58 approach to validate the results is discussed in section III, the results are presented in section IV, and the major 

59 findings and conclusions of this study are highlighted in section V. 

so II. Methodology 

6 1 A. SVM Solution 

62 Consider an [1 x n] input vector, y, where n = 11 is the number of geophysical variables that characterize snow 

63 and near-surface environmental conditions at a given location in space and time. In this study, y is derived from a 

64 land surface model simulation (further details provided in section II-B). Once trained on 7 j, observations, a nonlinear 

65 SVM can be used to estimate Tb at a given frequency and polarization for a particular location in space and time 
ee as a function of y via the approximating function 

m 

/( y) = Yj (a- - at) k(Xi, y) + 6 ( 1 ) 

i=\ 

67 where a and a* are the [m x 1 ] set of dual Lagrangian multipliers, &(x;,y) is the radial basis kernel function 
es computed as &(x;, y) = exp{-y ||x*- - y|| 2 }, x is the [m x n] training matrix, S is the “bias” coefficient, and m is the 

69 number of training targets. The variables or/ , a*, and 5 along with the corresponding set of support vectors are all 

70 defined during training, which is discussed in more detail in the appendix. It is worth noting here that x and y are 

71 computed with the same land surface model, but that the two sets are drawn from different periods of time and can 

72 therefore be considered independent. Once the approximating function is specified and the SVM has been trained, 

73 equation (1) provides a straightforward and computationally inexpensive method to estimate as a function of 

74 time given temporally varying near-surface conditions from the land surface model simulation. 

75 B. SVM Inputs and Outputs 

76 Inputs to the SVM are identical to those used in the ANN study. For brevity, only the essential details are discussed 

77 here with the acknowledgement that additional details may be found in [ 14 ]. Inputs to the SVM included a number 

78 of land surface state estimates derived from the NASA Catchment land surface model (Catchment) [ 21 ] and are 

79 listed in Table I. State variable estimates from Catchment, in general, are comprised of: 1 ) snow conditions and 
so 2 ) near- surface air, soil, and vegetation temperatures. The Catchment model was forced by surface meteorological 
si fields acquired from the Modern Era Retrospective- Analysis for Research and Applications (MERRA) product [ 30 ]. 
82 Daily-averaged Catchment output was generated on the Equal Area Scalable Earth (EASE) grid at a 25 km x 25 km 
ss horizontal resolution. AMSR-E measurements used as training targets and as independent validation were derived 
84 on the same 25 -km EASE grid; the AMSR-E T ^ measurements are discussed in more detail in section II-C 1 . The 
ss LIBSVM library [6] was employed for all SVM training and estimation activities in this study. 

se C. SVM Training 
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87 1 ) Training Targets: The S VM was trained using AMSR-E measurements collected at three different frequencies 

as - 10.65 GHz, 18.7 GHz, and 36.5 GHz - at both horizontal and vertical polarization. The resulting combination 

89 of the three frequencies and two polarizations yielded a total of six different sets of training targets (or outputs) as 

90 listed in Table I. These frequency and polarization combinations were selected due to their sensitivity to snow [5], 

91 [16], [19] and because the same combinations were used in [14], The latter enables a direct comparison between 

92 ANN and SVM performance, which is one of the main objectives of this study. Three additional AMSR-E channels 

93 - 6.9 GHz, 23.9 GHz, and 89.0 GHz - were available for use but were not employed in the SVM framework. 

94 This was done in part to maintain continuity with the ANN study and in part due to physical limitations associated 

95 with particular frequencies. For example, the 89.0 GHz channel was avoided due to significant atmospheric effects 

96 [7] and limitations associated with precipitating clouds [24]. In addition, even though the 23.9 GHz channel has 

97 a penetration depth into the snowpack that lies between that of the 18.7 GHz and 36.5 GHz channels, and could 

98 therefore provide additional information about snow conditions, its use was avoided due to significant interactions 

99 with atmospheric water vapor. Finally, as was similarly conducted in [14], the 6.9 GHz channel was excluded 

100 because its effective field of view (75km x 43km for the 3 dB footprint; [1]) is much greater than the grid spacing 

101 of the 25km x 25km EASE-grid product and because it is relatively insensitive to terrestrial snow [5], Additional 

102 evidence suggests the 6.9 GHz channel is negatively impacted by radio frequency interference [18], which further 

103 motivates its exclusion from the selected training targets. 

104 It has been demonstrated that forest cover attenuates PMW emission from the underlying snowpack while 

105 simultaneously adding its own contribution to the radiation as measured by the radiometer [34]. Recent research 
ioo has further shown that AMSR-E snow retrievals that employ PMW 7*s at 36 GHz are adversely impacted by 

107 forest effects and that correction strategies can be applied using radiation transfer theory [22]. In this present 

108 study, no such correction strategies have been applied. In other words, the AMSR-E 7* measurements used during 

109 training (as well as the 7* estimates generated by the trained SVM) over forested regions contain contributions 
no from both the snow and the vegetative canopy. Vegetation corrections were excluded from this study in order to 
in maintain continuity with the approach outlined in [14]. All AMSR-E 7&s used in this study were obtained from 

112 http://nsidc.org/data/nsidc-0301.html and are highlighted in [20]. 

113 2) Training Approach: A SVM was generated for each Tb frequency and polarization combination listed in 
n4 Table I. Each SVM was trained separately and independently at each grid cell on the 25 km EASE grid using the 
us available measurements collected by AMSR-E during the 9-year period from 1 September 2002 to 1 September 
us 2011. This 9-year period encompasses approximately 98% of the available AMSR-E data prior to 4 October 2011 
n7 when a problem associated with the rotation of the AMSR-E antenna occurred and regular science data collection 
ns ceased. Each SVM was trained for a two week (fortnight) period. This approach was used to address the strong 

119 seasonality in snow processes [14]. 

120 For a given fortnight in a given year, training activities employed the AMSR-E observations for the given fortnight 

121 from the other eight years in the training record. That is, training cycled through the 9-year period withholding 

122 each year in turn. Consequently, the AMSR-E measurements for the year that were not used for training were later 
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123 utilized during validation activities discussed below in section III. An identical procedure to define the training 

124 dataset was similarly used in the ANN study as discussed in [ 14 ]. 

125 Tests were conducted using less than eight years of training data; however, the results from these tests (not shown) 

126 suggested that, in general, SVM performance improved as more training data were made available. In addition, 

127 in order to enhance continuity from one fortnight to the next, a temporal overlap of two weeks was included at 

128 both the beginning and end of each training period. Only measurements collected during the nighttime AMSR-E 

129 overpass (roughly between 01:00 to 01:30 hours local time) were used during training in order to minimize wet 

130 snow effects. 

131 The SVM training procedure consisted of a two-fold training process (similar to that used during ANN training) 

132 in an effort to enhance SVM robustness. The two-fold procedure involved the selection of a subset (approximately 

133 50 %) of the 8-year training data with which the SVM was first trained. (Note that the training data discussed here 

134 are separate from the independent validation data mentioned above and discussed below in section III.) The trained 

135 SVM was then used to reproduce the subset of training data, and the mean square error ( MSE ) was computed 

1 36 between the SVM estimates and the training data subset. The process employing the first subset of training data was 

137 repeated across a range of values for the SVM parameters e and y (Appendix), each time computing (and storing) 

138 the resulting MSE. This procedure was then repeated using the remaining (i.e., the other 50 %) of the training 

139 data such that no reuse of training data occurred during the two-fold process. As conducted with the first subset of 
mo training data, the SVM was trained across a range of e and y values and MS E was computed. The combination of s 

141 and y values that yielded the closest agreement (in a mean-square sense) across the two training exercises conducted 

142 thus far was ultimately selected for use during the final SVM training procedure, which employed the entire (8-year) 

143 training data set. This final SVM was then used for the remainder of the comparisons described below. Additional 

144 tests ranging from a two-fold process up to a ten-fold process were conducted without any significant improvement 

145 found beyond the two-fold process. Therefore, the two-fold procedure was ultimately adopted as it incurred the 
we least amount of computational expense without any sacrifice in SVM performance while also maintaining continuity 
147 with the ANN study. 

us D. Study Domain 

149 The study domain shown in Figure 1 encompasses the North American continent poleward of 32 ° N and is 
iso identical to that used in the ANN study [ 14 ]. This region was selected because the domain includes all the major 

151 snow classes - tundra, taiga, maritime, prairie, alpine, and ephemeral - as defined in [ 33 ]. The 9 -year study period 

152 (1 September 2002 to 1 September 2011 ) corresponds to nearly the entire AMSR-E measurement record and is 

153 likewise identical to that used in the ANN study [ 14 ]. 

154 III. Validation Approach 

155 Validation of the SVM-derived estimates involved the use of the original AMSR-E measurements not used 
ise during SVM training (see section II-C 2 for more details). For any year of interest, the validation set of AMSR-E 
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157 measurements is completely separate and autonomous from the training datasets, and therefore constitutes a valid, 

158 independent comparison. Several different validation metrics were employed: 1) bias of the estimator, bias, 2) root 

159 mean squared error, RMS E, which includes the bias, and 3) anomaly correlation coefficient, anomaly R. The first 
iso two metrics were calculated from an original (i.e., “raw”) time series. The anomaly R metric, on the other hand, 
lei was calculated from an anomaly time series after the respective climatological (multi-year average) seasonal cycle 
iso was subtracted from each respective data set. Each metric was computed separately at each grid cell (based on 

163 daily data). Area-averaged metrics were computed by averaging the metrics across the snow-covered grid cells. 

164 In order to compute meaningful statistics, a number of constraints were enforced to ensure that time series of 

165 sufficient length were available. For example, snow must be present at a given location at least 5% of the year. As 
lee a result, the number of data points used in the statistical calculations shown for a grid cell ranged from a minimum 
le? of 164 along the southern boundary of the snow covered area to more than 2500 near the northern edge of the 
lea study domain. It is well recognized that the AMSR-E measurements contain error (standard deviation of ~1 K 

169 according to http://nsidc.org/data/docs/daac/amsre_instrument.gd.html), but this error is small when compared to 

170 the uncertainty in the SVM and ANN output (relative to the AMSR-E measurements) and is therefore neglected 

171 here. An identical approach was employed in [14] during the original ANN investigation and is similarly applied 

172 here to the SVM. 

173 IV. Results 

174 Assessment of SVM capability in estimating AMSR-E T/,s included comparisons of both SVM and ANN output 

175 relative to AMSR-E measurements not used during training activities. These comparisons included statistical maps 

176 for the 9-year study period, which yielded a large-scale analysis of SVM performance relative to the ANN (subsection 

177 IV-A). In addition, time series investigations (subsection IV-B) are provided at several different locations (location 

178 markers provided on Figure 1) over the course of an entire snow season. The time series investigation provided 

179 evidence as to the capability of the machine learning techniques at reproducing AMSR-E measurements during both 
iso the snow accumulation portion of a snow season and the subsequent ablation phase. Moreover, a brief investigation 

1 8 1 on the spatial and temporal variability of the machine learning estimates is provided in section IV-C in order to 

1 82 highlight each technique’s skill at reproducing the variability in the original AMSR-E measurements. Finally, the 

183 potential for employing the SVM within a data assimilation framework (subsection IV-D) is briefly highlighted via 

184 investigation of the resulting Kalman gain matrix. In an analogous manner as conducted in [14], most discussions 
las focused on the 18V and 36V results because these channels are considered the most informative when viewed in the 
las context of SWE estimation [5]. However, it is worth noting here that all frequency and polarization combinations 
i87 listed in Table I were investigated and analyzed in a similar fashion as the 18V and 36V results. 

las A. Cross-validation 

189 Figure 2 provides a large-scale overview of SVM versus ANN performance at 18V over the course of the 9- 

1 9 0 year study period. Each subplot represents a statistical map for either the SVM output (left column) or the ANN 
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191 output (middle column) computed relative to the AMSR-E measurements not used during training. The top row 

192 illustrates the bias in the SVM estimates (Figure 2 a), the bias in the ANN estimates (Figure 2 b), and the difference 

193 between the two (Figure 2 c). Analogously, the middle row highlights the computed RMSE whereas the bottom row 

194 highlights the computed anomaly R. 

195 In terms of bias, both the SVM and ANN yield relatively unbiased estimates when averaged over the entire study 

1 96 domain across the 9 -year study period. The SVM estimates contain approximately 1 K more positive bias (relative 

197 to the ANN estimates) in regions surrounding Hudson Bay, across northern Quebec, and in western Alaska near the 

1 98 Bearing Sea. Conversely, the SVM contains approximately 1-2 K more negative bias in regions covered by boreal 

199 forest. Figure 2 c further highlights the increase in the magnitude of bias in the SVM output (relative to the ANN 

200 estimates), but this bias is small when compared to the temporal variability of the original AMSR-E measurements 

201 (further discussion provided in section IV-C) and, in general, falls within the estimated error standard deviation (-1 

202 K according to http://nsidc.org/data/docs/daac/amsre_instrument.gd.html). 

203 Despite the small increase in bias generated by the SVM relative to the ANN output, results provided in Figures 

204 2 d-f show the SVM contains significantly less RMSE than the ANN. The reduction in RMSE within the SVM 

205 estimates is witnessed across the entire study domain, including regions with and without significant vegetative 

206 cover, and are most apparent in regions where sub-grid scale lakes (i.e., lakes smaller than the 25 -km EASE pixel 

207 size) are common. Additional reductions in RMSE are also found along the southern periphery of the snow line 

208 where the snow pack is thin and ephemeral and where freeze- thaw cycles are relatively common [ 14 ]. 

209 Figure 3 a presents box plots of computed RMS E for all frequency and polarization combinations examined in 

210 this study. It is clear that the SVM yields a reduction in computed RMSE relative to the ANN results. When 

211 viewed from the perspective of the median value, SVM-derived RMSE is reduced, on average, by - 20 - 25 % from 

212 the ANN-derived results. In addition, the extreme values (i.e., the 90 th-percentiles) are greatly reduced such that 

213 the SVM yields more stable and more accurate results when compared to the ANN estimates for the same study 

214 period and study domain. 

215 The final set of statistics provided in Figures 2 g-i shows the computed anomaly R over the 9 -year study period. 

216 Anomaly R is useful in that it focuses on the capability of each technique to capture the synoptic-scale and inter- 

217 annual variability of the Tb estimates across the entire spatial domain. As is clearly seen, the SVM-based estimates 

218 are superior to those derived from the ANN. In particular, the anomaly R in regions to the north and south of the 

219 boreal forest is nearly doubled from - 0.4 to - 0 . 8 . Within forested regions, Tb as measured by AMSR-E includes 

220 PMW emission from the forest canopy [ 22 ], The ANN benefits greatly from model-derived skin temperature, T s u n , 

221 within the forest canopy, which yields much greater anomaly R values in regions where significant forest cover 

222 is present [ 14 ]. However, in regions where significant forest cover is not present, ANN-based performance as a 

223 function of time is drastically reduced. The SVM, on the other hand, is able to better utilize the full set of input 

224 variables outlined in Table I across a broader range of conditions, including both forested and non-forested areas. 

225 The dramatic improvements in anomaly R values computed from the SVM for the other evaluated frequency and 

226 polarization combinations are further witnessed in Figure 3 b. The SVM is clearly able to capture much more 
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227 of the temporal variability found in the original AMSR-E Tb measurements. As was the case with the RMSE 

228 results, the median anomaly R values based on the SVM estimates are better for the vertically-polarized channels 

229 when compared to the horizontally-polarized channels, but these differences are relatively small and suggest that, 

230 in general, the SVM outperforms the ANN across space and time at all frequency and polarization combinations 

231 evaluated in this study. 

232 B. Time Series Investigation 

233 Results presented thus far focused on the time-integrated behavior of the SVM over the 9 -year study period and 

234 its performance relative to that of the ANN. A time series investigation is discussed here in order to better illustrate 

235 the performance of the SVM throughout the snow season at a handful of representative locations. The goal of this 

236 investigation is to highlight the capability of the SVM to estimate Tb during the snow accumulation season when 

237 the snow is dry (and hence acts as an efficient scatterer) as well as during the snow ablation season when the 

238 snow is relatively wet (and hence acts a relatively efficient emitter). The 2003 - 2004 snow season was selected 

239 for analysis because it is representative of a typical snow season during the 9-year study period. 

240 Figure 4 highlights Tb time series for three different locations (shown as red circles in Figure 1 ). These particular 

241 locations were chosen because they represent the most dominant snow classifications (in terms of North American 

242 coverage in Figure 1 ) and because these three locations represent a range of different vegetative covers as well as 

243 maximum snow depths at peak accumulation. Namely, the first subplot (Figure 4 a) is for a location with relatively 

244 shallow snow and little vegetative cover, the second subplot (Figure 4 b) is for a location with moderate snow depth 

245 and relatively thick vegetative cover, and the third subplot (Figure 4 b) is for a location with relatively deep snow 

246 and a modest amount of vegetative cover. The short gap in all time series in early-November 2003 is due to missing 

247 AMSR-E observations. The presence of a solid line (ANN or SVM) indicates the presence of snow as modeled by 

248 Catchment. 

249 As is shown, both the ANN and the SVM do a reasonable job at reproducing the AMSR-E measurements not used 

250 during training. Both techniques of machine learning capture the large-scale features present in the AMSR-E T b 

251 measurements, including both the accumulation and ablation phases of the snow season. However, clear differences 

252 between the ANN and SVM estimates are also seen. Namely, the SVM does a much better job of capturing the high 

253 frequency (i.e., day-to-day) variability associated with synoptic scale processes. The ANN estimates, on the other 

254 hand, often lack this high frequency variability as is witnessed by the step function-like features present during 

255 portions of the snow season at each of the three locations. These clear differences in the ANN versus SVM estimated 

256 variability over time scales of a few days to a week corroborate the anomaly R results highlighted in Figures 2 g-i 

257 and Figure 3 b. These findings suggest the ANN output is less sensitive to certain changes in the modeled inputs 

258 whereas the SVM output is significantly more sensitive to changes in the modeled inputs as a function of time, 

259 and hence, yield Tb estimates that capture more of the high frequency temporal variability. Similar features are also 

260 found in the 10 H, 10 V, 18 H, and 36 H Tb estimates (results not shown). 

261 An additional note of interest regards the presence of snow as modeled by Catchment, which is used as input to 
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262 both the ANN and SVM (Table I). The AMSR-E measurements shown in Figures 4 a and 4 b suggest the presence 

263 of snow when the difference between 18 V and 36 V is greater than zero. The Catchment model, in general, suggests 

264 the presence of snow only where ANN or SVM predictions are made available (i.e., by the solid lines). In Figures 

265 4 a and 4 b, the Catchment model predicts the complete melt of the snow pack several weeks earlier than is suggested 

266 by the AMSR-E measurements. The exact cause of the discrepancy is currently unknown (and beyond the current 

267 scope of work for this study). It remains to be seen whether such errors could be corrected through data assimilation. 

268 C. Output Variability 

269 The findings presented above demonstrate the ability of a SVM to yield relatively unbiased AMSR-E T b estimates 

270 with a modest amount of RMSE and significant skill (in terms of anomaly R) over synoptic and seasonal time 

271 scales. Further, it was shown the SVM improves upon the ANN, in general, at all frequency and polarization 

272 combinations examined in this study. An important question that remains, especially when viewed in the context of 

273 a data assimilation framework, is whether the SVM estimates can reasonably represent the spatiotemporal variability 

274 of the AMSR-E T b measurements. 

275 The bar plots in Figure 5 highlight the variability for both the ANN-derived and SVM-derived T b estimates. The 

276 corresponding variability in the AMSR-E measurements not used during training is also included. In addition to 

277 showing results for all of the frequency and polarizations used in this study, the results are further stratified by 

278 snow class (Figure 1 ). Each of the six snow classes - tundra, taiga, maritime (abbreviated mark), alpine, prairie, 

279 and ephemeral (abbreviated ephem.) - cover hundreds (or more) EASE grid cells. For the purpose of this analysis, 

280 variability is first computed as the spatial standard deviation for each day when snow is present and then averaged 

281 in time over the 9 -year study period. As is shown in Figure 5 , both the ANN and SVM variabilities agree quite 

282 well with the variability in the AMSR-E measurements for each frequency/polarization combination and for each 

283 snow class. However, it is clear the SVM agrees better with the AMSR-E measurements (relative to the ANN- 

284 based output) in almost every category. In addition, the SVM (and ANN) estimates capture many of the large-scale 

285 features witnessed in the AMSR-E measurements. For example, the variability in the horizontally-polarized estimates 

286 is generally greater than their vertically-polarized counterparts for a given snow class. This behavior can be partly 

287 explained by the increased sensitivity of horizontally polarized T b s to the presence of internal ice layers and surface 

288 crust [ 24 ], [ 28 ], Further, the variability in both the ANN and SVM estimates (and AMSR-E measurements) is 

289 generally greatest in the tundra and taiga regions where the boreal forest is located, which suggests the forest 

290 influences contained within the AMSR-E measurements [ 22 ] are reproduced by the machine learning techniques. 

291 However, the SVM estimates clearly match the AMSR-E measurements more closely (relative to the ANN) for 

292 all frequency, polarization, and snow class combinations. The increased variability in the SVM estimates (relative 

293 to the ANN) corroborates the previous results that showed the SVM captures much more of the high frequency 

294 variability at a given location (see Figure 4 for examples), which leads to the increased variability across space and 

295 time as witnessed in Figure 5 . 
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296 D. Potential for Data Assimilation 

The development of the SVM was originally motivated so that it could eventually be included as an observation 
operator within a data assimilation framework [ 25 ], [ 29 ] for the purpose of merging AMSR-E T b measurements 
with SVM-derived T b estimates into a LSM. In order to assess the potential of the SVM within the data assimilation 
framework, a brief investigation of the error covariance structure between the LSM and the SVM-based T b estimates 
is presented here. The error covariance is computed as a gain matrix, K, which represents a weighted average of 
the uncertainty in the LSM-derived estimates of SWE along with the spectral difference in the T b estimates at 18 V 
and 36 V. The presence of a non-zero error covariance structure would suggest a degree of potential for a follow-on 
data assimilation study employing the SVM. The gain K is computed as 

K = C yz (C zz + Cvv) -1 , ( 2 ) 

297 where C yz is the (sample) cross-covariance between the ensemble of prior land model states and the SVM- or 

298 ANN-predicted 7 &s, C zz is the (sample) covariance of the predicted T^s, and C vv = 1 K 2 is the T \ measurement 

299 error variance. The gain K is computed at each pixel between the modeled SWE and SVM-based estimates of 

300 A 7 £= 18 V- 36 V The spectral difference A 7 ^= 18 V- 36 V is employed here as it is commonly used to estimate SWE 

301 [ 15 ] and serves to represent the linkage between SWE and 7 V The larger the spectral difference, in general, the 

302 greater amount of SWE is present [ 5 ]. Catchment model perturbations were implemented using the methods of 

303 [ 13 ] and performed in an identical fashion as conducted in [ 14 ]. As a first approximation for demonstrating the 

304 potential for a non-zero error covariance structure, only the prior model estimate (without an analysis update step) 

305 is used here. Again, this simplified approach is merely to demonstrate the potential for future inclusion into a data 

306 assimilation procedure. 

307 Figure 6 shows the computed gain over snow-covered regions in the domain collocated with the presence of 

308 AMSR-E measurements on 1 February 2003 when SWE is near peak accumulation. The collocation with AMSR-E 

309 serves to highlight the spatial extent that could eventually be updated when using the Kalman filter. Figure 6a shows 

310 the gain using the ANN-based T b estimates whereas Figure 6b show the computed gain using the SVM-based T ^ 
3n estimates. If the difference between the AMSR-E measurements and the T t, estimates is +1 K, a gain of K =10 

312 mm K -1 translates to an increase of 10 mm in the posterior (updated) modeled SWE. Alternatively, if the gain is 

313 negative (e.g., K =-10 mm K -1 ) with a +1 K difference between the AMSR-E measurement and the estimated T ^ 

314 would result in a decrease in the posterior SWE estimate of 10 mm. 

315 The large-scale structure in Figures 6a and 6b are similar in that relatively large, positive gains are found in 

316 the northeastern portion of the domain as well as throughout much of the Rocky Mountains. In both cases, this is 

317 due to relatively small values of C zz + C vv in conjunction with relatively large values of C yz . However, significant 
sis differences also occur as quantified by a modest pattern (spatial) correlation of R = 0.31 between the maps shown 

319 in Figure 6. More specifically, the ANN-derived gain in Figure 6a contains a series of unusual striations across the 

320 north-central portion of Canada. These striations are apparently the result of limited sensitivity in ANN output due 

321 to small perturbations in an ensemble of ANN inputs. The result is that neighboring cells, at times, yield similar 
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322 (or identical) values of K, which can, at times, result in the appearance of striated features. The SVM-based gain 

323 in Figure 6b, on the other hand, does not suffer from these striated features and, in turn, yields a more smooth and 

324 continuous estimate of the computed gain across space. The presence of negative gains is a particularly interesting 

325 feature given the first SWE retrieval algorithm originally presented by [ 5 ]. Namely, the earliest retrieval algorithm 

326 suggested a direct, linear relationship between SWE and A T b . However, the presence of positive and negative gains 

327 shown here suggests a non-linear relationship between SWE and AT b . More work is required to better understand 

328 this non-linear behavior, but is considered beyond the scope of the present study. Even though this simple exercise 

329 is far from the in-depth investigation of error covariance planned for a follow-on study, it does serve to demonstrate 

330 that a non-zero error covariance exists between the modeled SWE and the estimated A T b spectral difference and that 

331 this error structure could be leveraged within an ensemble filter framework in order to produce a merged (updated) 

332 model estimate of SWE that improves upon the original (prior) model estimate. 

333 V. Conclusions 

334 An SVM was developed in order to estimate AMSR-E T b at specific frequencies and polarizations. The eventual 

335 use of the SVM is to serve as a measurement operator within an ensemble-based data assimilation framework for 

336 the purpose of improving SWE estimation at regional and continental scales. The model capability of the SVM was 

337 compared against an alternative form of machine learning - the ANN - originally presented in [ 14 ]. Quantitative 

338 comparisons are made to highlight the skill of the SVM relative to that of the ANN. Both the SVM and ANN utilize 

339 output from the NASA Catchment model (forced with MERRA surface meteorological fields) as input for generating 

340 T b estimates. Horizontally- and vertically-polarized T b s from AMSR-E at 10 . 65 , 18 . 7 , and 36.5 GHz supplied on a 

341 25 km x 25 km resolution equal area grid were used during training. Subsequent comparisons with SVM and ANN 

342 estimates employed AMSR-E measurements not used during the training activities so that independent verification 

343 activities could be conducted. 

344 When averaged across the North American study domain over the course of a 9 -year study period, SVM-derived 

345 T b estimates were found to be relatively unbiased (\bias\ < 1 K), contain median RMSE values of less than 10 

346 K, and possess skill that yielded anomaly R values on the order of 0 . 8 . The SVM technique outperformed the 

347 ANN in every major snow class (as defined by [ 33 ]) with a notable increase in the ability to reproduce the high 

348 frequency temporal variability present in the AMSR-E measurements. In addition, a brief inspection was made 

349 into the error covariance structure between modeled SWE (via the Catchment model) and a spectral difference in 

350 T b as computed by the two different machine learning techniques. The results showed the presence of a non-zero 

351 covariance structure, which could eventually be leveraged within a data assimilation framework in order to improve 

352 regional- and continental- scale SWE estimates. 

353 In short, the trained SVM presented here is a superior alternative to the ANN originally presented in [ 14 ]. Even 

354 though the training data used by both techniques were identical, and all other relevant aspects during the learning 

355 process were held as equivalent as possible between the two different machine learning techniques, it is clear that 
sse the SVM as applied in this study yields better performance. One hypothesis is that the SVM learning procedure 
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357 focuses on a single frequency / polarization combination whereas the ANN simultaneously yields estimates for all 

358 of the frequency / polarization combinations. The reduction in the number of degrees of freedom in the ANN is a 

359 likely contributor to the reduced performance relative to the SVM. A series of tests (results not shown) using an 

360 ANN trained to estimate only a single frequency / polarization combination found improved performance relative 

361 to the multi-7), presented in [14]. However, the level of improvement using the single 7), ANN framework still did 

362 not achieve the same degree of performance as found with the SVM. The increase in degrees of freedom in the 

363 SVM relative to the ANN, in part, helps explain why the SVM outperformed the ANN. An additional reason for 

364 this behavior could also be attributed the dependence of input from an “expert” user regarding the exact structure of 

365 the ANN (e.g., number of layers, number of hidden nodes per layer) prior to training that is not similarly required 

366 in the SVM setup. 

367 In an analogous manner as the ANN presented in [14], it is worthwhile to discuss and highlight some of the 
see limitations of the SVM presented here. For starters, the Catchment model used to generate the inputs to the SVM 

369 does not account for ice crust on the surface of the snowpack, internal ice layers within the snowpack, or sub-grid 

370 scale lake ice underlying the snowpack. Hence, the SVM-derived estimates do not explicitly account for these effects, 

371 which limits the skill of the SVM-based T b estimates. In addition, AMSR-E is no longer collecting measurements due 

372 to a problem associated with the rotation of the AMSR-E antenna. However, AMSR2 on-board the Japanese Global 

373 Change Observation Mission - Water (GCOM-W) satellite is currently collecting T b measurements at comparable 

374 frequencies and polarizations as AMSR-E before its malfunction. Preliminary results to be presented in a follow-on 

375 study suggest the SVM (and ANN) can be trained on AMSR-E measurements and then used to subsequently predict 

376 AMSR2 T b s. That is, the machine learning technique can be used to estimate measurements from one sensor using 

377 training targets from another sensor. This transferability could enable a continuous record forward in time even 

378 though AMSR-E science data collection is inactive. 

379 Despite some deficiencies in the SVM approach, it is worthwhile reiterating the skill in the SVM estimates and 
sso the clear improvements relative to the ANN-based approach in [14]. The SVM was shown to effectively reproduce 
ssi AMSR-E T b s at multiple frequencies and polarizations during both the accumulation phase when the snowpack 
382 relatively dry as well as during the ablation phase when the snowpack is relatively wet. Significant skill was 
sss demonstrated in both shallow and deep snow environments, in areas with and without vegetative cover, and across 
384 all six major snow classes common across North America (and the northern hemisphere as a whole). On-going 
sss studies are investigating the sensitivity of SVM-derived estimates to snow-related variables (most notably SWE) 
sse and preliminary results suggest a considerable amount of sensitivity is present in the SVM across the majority of 
387 the study domain. These findings suggest a trained SVM could serve as an effective and computationally efficient 
sss measurement operator within a data assimilation procedure for which it was originally constructed. 
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394 Appendix 

395 Consider an [mXn\ training matrix, x, and an [rax 1] vector of training targets, z, such that {(xi,zi) , . . . , (x m , 

396 In the context of this study, x represents n = 11 geophysical variables that characterize snow and near-surface 

397 environmental conditions at a given location and at m different times as derived from a land surface model 

398 simulation (further details provided in section II-B). The vector z represents a corresponding series of m satellite- 

399 based measurements of PMW 7 * at a given frequency and polarization (further details provided in section II-C 1 ). 

400 Assume (f>(x ) is a nonlinear function that maps the geophysical inputs from the land surface model, x, into Tb space 

401 as 

/( w,d) = (w • 0(x)> + 6 (3) 


402 where w is a vector of weights, (w • 0(x)) is the inner dot product of w and 0(x), and S is a “bias” coefficient. For 

403 given parameters C > 0 and s > 0 , the standard (primal) form of nonlinear support vector regression [6], [ 37 ] may 

404 be written as 


minimize 

W, 6, c 

subject to 


| m 

-<w • w) + C V (£■ + £) 
1=1 

<w • <f>(x,)) + 6 - Zi < s + 6, 
Zi - <w • 0(X;)> - 6 < £ + £*, 
&£ > 0 ,/= 1,2,..., m. 


(4) 


405 where m is the available number of Tb measurements in time (for a given location in space), Zi is a Tb measurement 

406 at time i, and £ and are slack variables. The values of w, S , £, and are not specified a priori , but rather are 

407 determined as a result of the minimization process. The goal of the minimization procedure is to determine values 

408 for w, d, £ and £* such that the mapped inputs (computed as (w • 0(x*)) + d) most closely agree with the training 

409 targets, z, provided in 7 ^ space. 

410 The primal optimization is commonly solved in dual form [6], [ 32 ], [ 37 ], [ 40 ] by differentiating the primal form 

411 with respect to the primal variables (i.e., w, d, £, and £*) as follows: 

I m 

minimize - V (or,- - a * ) (aj - a*) (<p(x,) ■ <p(Xjj) ( 5 ) 

at, a* Z ^ J ' 

ij= 1 

m m 

+e ^ (<*; + a *i )~Y Zi _ a *i ^ 

i= 1 i=l 

m 

subject to ^ (ai - or*) = 0, 

i=i 

a t , a * e [0 , C] , i = 1 , 2 , . . . , m 
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412 where 07 and a* are a dual set of Lagrangian multipliers, ( 0 (x,) • <p(Xj)) is the inner dot product of </>(x,j and 

413 tp(Xj), e is the specified error tolerance, and C is a positive constant that dictates the penalized loss during SVM 

414 training. The Lagrangian multipliers, 07 and a*, are nonzero for points equal to or outside of the e-insensitive tube 

415 and alternatively vanish for points inside the e-insensitive tube. The points with nonzero Lagrangian multipliers 

416 comprise the so-called “support vectors”. The process described here is similar to that employed by an ANN with 

417 a fundamental difference in that the SVM utilizes the weights (computed as a, - a*) as a subset of the training 

418 patterns [ 32 ], 

419 The computation of ( 0 (x,)- 0 (Xj)) in feature space is often too complex to perform [ 32 ]. However, the computation 

420 may be conducted in input (land surface model) space using the kernel function k(Xi, xj) = (<p(xi) ■ <p(Xj)) in order to 

421 yield the inner products in feature space, which helps avoid problems of computational infeasibility associated with 

422 directly evaluating the basis function in high dimensionality feature space. In this particular study, a radial basis 

423 kernel function, k(x,, Xj), was employed that satisfies the expression k(x,,Xj) = ( 0 (x,) • <p(Xjj) = exp{-y||x, - x ; || 2 } 

424 where x, and xj are single instances of x (in time and space), ||-|| represents the Euclidean norm, and y is proportional 

425 to the inverse square of the width parameter as described in [8]. The loss function was specified as e-insensitive 

426 [ 37 ], [ 38 ]. Quadratic, Laplace, and Huber loss functions were also tested. Since no notable improvements over 

427 the e-insensitive loss function were found, the e-insensitive loss function was selected as the most appropriate. In 

428 addition, the regularization parameter, C, was defined as the range of the training targets (i.e., C = max [z] - min [z]) 

429 using the methods outlined in [ 23 ], An alternate formulation based on [8] was tested using C = 6<x z , where <x z is the 

430 standard deviation of the training targets, but no significant difference between the two different definitions of C was 

431 found. Hence, the former approach was employed such that C was set equal to the range of the training targets. Once 

432 the solutions for 07 and a* are found, estimates of 7), can then be computed via Equation ( 1 ) using geophysical 

433 inputs (derived from the land surface model), y, that are distinct from the training data where x represents the 

434 training matrix and 6 is computed as the average of the support vectors (i.e., the subset of the training data with 

435 nonzero Lagrangian multipliers). 
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TABLE I 

SVM INPUTS AND OUTPUTS (REPRODUCED FROM [14]). 


Inputs 

Symbol 

Top layer snow density 

Psnl 

Middle layer snow density 

Psn2 

Bottom layer snow density 

Psn3 

Snow liquid water content 0 

SLWC 

Snow water equivalent 0 

SWE 

Near-surface air temperature 

T a ir 

Near-surface soil temperature 

T P l 

Skin temperature 

T skin 

Top layer snow temperature 

T sn l 

Bottom layer snow temperature 

T sn3 

Temperature gradient index 

TGI 

Outputs 

Symbol 

T b at 10.65 GHz, H-polarization 

\m 

T b at 10.65 GHz, V-polarization 

10V 

T b at 18.7 GHz, H-polarization 

18/7 

T b at 18.7 GHz, V-polarization 

18V 

T b at 36.5 GHz, H-polarization 

36 H 

T b at 36.5 GHz, V-polarization 

36V 


° = Column-integrated quantity; 
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Fig. 1. Study domain encompassing North America poleward of 32° N. Coloring represent the snow classification of [33]. The three red circles 
represent the locations of the time series comparisons shown in Figure 4. 
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SVM ANN SVM - ANN 



Fig. 2. (Top row) bias, (middle row) RMSE, and (bottom row) anomaly R for the ANN and SVM (versus AMSR-E observations not used 
during training) at 18V for the time period 1 September 2002 through 1 September 2011. Results include (left column) SVM metrics, (middle 
column) ANN metrics, and (right column) computed difference between SVM and ANN metrics. 
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ANN SVM ANN SVM ANN SVM ANN SVM ANN SVM ANN SVM 


Fig. 3. Statistical box plots of a) RMSE and b) anomaly R across the North American domain for the ANN and SVM from 1 September 
2002 through 1 September 2011. Statistics are computed relative to AMSR-E measurements not used during training. Each box represents the 
median along with the 25th- and 75th-percentiles while the whiskers illustrate the 10th- and 90th-percentiles. 
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Fig. 4. Time series from 1 September 2003 through 1 Jun 2004 including AMSR-E observations, ANN estimates, and SVM estimates at 18V 
and 36V. a) Location with shallow snow depth and no forest cover (max. SWE = 0.07 cm; FF = 0.0; lat = 66.5°; Ion = -66.7°). b) Location 
with moderate snow depth and high forest cover (max. SWE = 0.13 m; FF = 0.89; lat = 52.4°; Ion = -85.1°). c) Location with large snow 
depth and modest forest fraction (max. SWE = 0.22 m; FF = 0.02; lat = 67.6°; Ion = -151.6°). See also Figure 1 for locations. 


January 4, 2014 


DRAFT 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 


Page 22 of 24 


IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. XX, XXX 2014 22 



£ 

2 - 

OO 

rH 

b 

ST 




| AMSR-E 


■ II 



15 

10 

5 

0 

15 


ll ll II II II II 


ijijjijji 


15 

101 - 


IllJl J Jl J Jl 

Tundra Taiga Mari. Alpine Prairie Ephem. 


Fig. 5. Spatial variability (cr) of (black) AMSR-E, (dark gray) ANN, and (light gray) SVM T^s time-averaged by snow class according to [33] 
for the 9-year study period for all frequency and polarization combinations. 
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Fig. 6. Kalman gain for SWE versus A7£=18V-36V near peak SWE accumulation on 1 February 2003 for a) ANN-derived estimates and b) 
SVM-derived estimates. 
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